Contaminations in genomic sequences
نویسندگان
چکیده
Despite continued advances in whole genome sequencing techniques and the development of powerful assembly algorithms, newly sequenced genomes still often suffer from contaminations during the sequencing process. The most common sources of contamination are accessory DNAs deliberately attached to the DNA/RNA under investigation, including vectors, adapters, linkers and PCR primers. However, there are also unintended events, e.g. caused by transposon activity or simply impurities, leading to contaminated genomic sequences. These may then result in missassemblies of genomic sequences, meaningless analyses and potentially erroneous conclusions. However, no one knows to which extent publicly available genomes are contaminated. To encompass this unsatisfying situation we therefore plan to develop a comparative genomics approach to broadly identify contaminations in available genomic sequences. However there exist some tools those can find contaminations from adapters or from vectors alone. Here we present an approach based on machine learning to distinguish between contaminated and non-contaminated sequences instead of finding vector contaminations or adapter contaminations alone. As for now no such tool available, our approach would be foremost and showed promising results on different datasets.
منابع مشابه
A new method to prevent carry-over contaminations in two-step PCR NGS library preparations
Two-step PCR procedures are an efficient and well established way to generate amplicon libraries for NGS sequencing. However, there is a high risk of cross-contamination by carry-over of amplicons from first to second amplification rounds, potentially leading to severe misinterpretation of results. Here we describe a new method able to prevent and/or to identify carry-over contaminations by int...
متن کاملAn efficient and simple CTAB based method for total genomic DNA isolation from low amounts of aquatic plants leaves with a high level of secondary metabolites
An efficient DNA isolation protocol specifically modified to get pure quality DNA required for molecular studieshas been reported in this paper. Some aquatic plants (Potamogeton spp., Ceratophyllum demersum and Myriophyllum spicatum) were used for the study. The protocol developed will be useful in getting high and pure DNA. Instead of using the available DNA extraction kits, this protocol can ...
متن کاملSerological and genomic detection of bovine leukemia virus in human and cattle samples
Bovine leukemia virus (BLV) is a retrovirus responsible for lymphoproliferative disorders in cattle. Although infections of BLV in animals are well known, little is known about its capacity to infect humans. This study investigated the presence of anti-BLV antibodies and BLV proviruses in human and cattle samples. An indirect enzyme-linked immunosorbent assay (ELISA) was used to detect anti-BL...
متن کاملDetection of bacterial contaminants and hybrid sequences in the genome of the kelp Saccharina japonica using Taxoblast
Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searche...
متن کاملOptimization of the genomic DNA extraction in some mosses
The presence of organic compounds and high amount of secondary metabolites (polysaccharides, phenolic component, etc.) in mosses cause difficulties in DNA extraction that are followed by problems in PCR reactions. In lower plants, various methods have been used for DNA extraction including silica gel and different commercial kits. These methods mostly use hazardous (like phenol or liquid nitrog...
متن کامل